[pull] master from tensorflow:master by pull[bot] · Pull Request #1688 · GesuBackups/tensorflow

pull · 2026-04-02T19:29:14Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

This CL is part 1 of cl/886092985. It lands the logic for negation (-v0 instead of + -1 * v0) and simplifies map output by omitting empty symbol brackets []. It also deprecates many AffineExpr/Map methods in indexing_map_serialization. Landing this minimizes the number of test updates we have to do in the following CLs. PiperOrigin-RevId: 893497910

PiperOrigin-RevId: 893502380

…s to avoid instruction cache thrashing. PiperOrigin-RevId: 893502849

…hine. This change introduces a static factory method `TargetMachineOptions::Native()` which automatically populates the target triple, CPU name, and CPU features based on the host machine's characteristics using LLVM's host detection utilities. A test is added to ensure the inferred options match LLVM's host CPU information. PiperOrigin-RevId: 893503067

PiperOrigin-RevId: 893503478

…cessing. This was relied on for using local wheels as overrides for matching requirements defined in lock files, and will allow to stop using the purely suggestive `--find-links` that was needed for the 1.8.4 upgrade because of the regression that's fixed in 1.8.5.\ `pkg @ wheel URL` will be used instead for local wheels again. The 1.8.5 upgrade is a tiny regression fix-only upgrade and thus doesn't require any other adjustments:\ https://rules-python.readthedocs.io/en/latest/changelog.html#v1-8-5 More fix context:\ openxla/xla@11a2044 openxla/xla@0025bf7 PiperOrigin-RevId: 893537944

PiperOrigin-RevId: 893542078

When the user is not requesting a specific CPU architecture XLA automatically detects the host architecture and passes this information to FFI based custom calls and XLA:CPU. So far this detection has been ignoring architecture features (like SSE, Neon, AVX, etc.). So this change adds the missing HW feature detection and also updates the embedded system configs. It also adds a tests that ensures the the embedded system configs are in sync with the actual systems. The feature detection takes `DebugOptions::xla_cpu_max_isa` into account which allows the user to limit the feature set to an older generation of CPU to make the binary more portable. PiperOrigin-RevId: 893544830

PiperOrigin-RevId: 893562678

Imported from GitHub PR openxla/xla#40311 XLA:GPU switched to structured concurrency with `AsyncStartThunk` and `AsyncDoneThunk`, remove a bad experiment with `WaitForStreamThunks`. Copybara import of the project: -- 303f683519bd73d80b474570cfe63953ccd83e7d by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:gpu] Delete vestigial WaitForStreamsThunk Merging this change closes #40311 PiperOrigin-RevId: 893588227

PiperOrigin-RevId: 893591324

PiperOrigin-RevId: 893592092

There's no reason to execute the generated HLO. This test exercises a utility that generates a specific sequence of XlaOps. This change refactors the test such that we're just comparing the generated HLO against an expected HLO. The expected HLO was obtained from printing the module generated by the old test. PiperOrigin-RevId: 893592373

…further investigation. Reverts 7fc14e3 PiperOrigin-RevId: 893594429

Reverts c6d844d PiperOrigin-RevId: 893602331

… module does not have a name PiperOrigin-RevId: 893611161

PiperOrigin-RevId: 893613883

PiperOrigin-RevId: 893615727

…ologyDescription from proto PiperOrigin-RevId: 893622566

Imported from GitHub PR openxla/xla#40117 This PR updates the XLA Linux x86 GPU oneAPI presubmit coverage by expanding the test scope from //xla/stream_executor/sycl/... and //xla/service/gpu/... to the broader set //xla/..., //build_tools/..., and @tsl//tsl/..., ensuring more comprehensive validation. Accordingly, it switches from _XLA_ONEAPI_TARGET_PATTERNS to _XLA_DEFAULT_TARGET_PATTERNS to align oneAPI presubmit checks with the default XLA test coverage. Copybara import of the project: -- 50e4447d7e31ffd9f71fa20805a5b330089b77da by mraunak <mayank.kumar.raunak@intel.com>: Update build.py -- eb896f9d6386638e8150e815b18ecd0f88235dca by mraunak <mayank.kumar.raunak@intel.com>: Update golden_commands.txt Merging this change closes #40117 PiperOrigin-RevId: 893639525

By this change, the shardy outliner translates the named computations into separate calls leaving it as a flat call graph. PiperOrigin-RevId: 893645729

- Moved TransposePlanCache and its mutex from PjRtStreamExecutorClient, PjRtCpuClient, and TpuClient to CommonPjRtClient. - Added GetTransposePlan to CommonPjRtClient for thread-safe access. - Updated call sites to use the new centralized interface. PiperOrigin-RevId: 893657648

…alAsync` This fixes a bug where PjRt CPU buffers ignore major-to-minor in the layout inside `ToLiteral`. The CL also makes `CopyToLiteralAsync` perform the work asynchronously as intended by the API. PiperOrigin-RevId: 893674302

PiperOrigin-RevId: 893678182

PiperOrigin-RevId: 893686530

…des on all memory kinds PiperOrigin-RevId: 893691249

This flag preset will continue to be developed with fast compilation times and numerical stability in mind as the top goals (runtime performance only a secondary goal). Expect tradeoffs similar to XX% compilation time for X% runtime to occur under this flag. Currently, it just sets LLVM codegen opt to O1, disables platform dependent math, and sets `flatten_after_fusion` to true. PiperOrigin-RevId: 893693608

…ucket.table. Also, cleanup includes. Benchmarks are slightly negative, which is expected because the benchmark doesn't cover the high-contention/very-large-buckets motivating case. Note that the benchmarks are still faster than when we used tsl::Hash64. ``` name cpu/op cpu/op vs base BM_SendRecv 93.38n ± 2% 99.09n ± 1% +6.12% (p=0.000 n=20) BM_RecvSend 76.73n ± 1% 83.00n ± 1% +8.18% (p=0.000 n=20) BM_PingPong/100 308.9µ ± 2% 311.7µ ± 2% ~ (p=0.841 n=20) BM_PingPong/200 612.4µ ± 3% 614.2µ ± 2% ~ (p=0.799 n=20) BM_PingPong/300 929.6µ ± 3% 932.4µ ± 3% ~ (p=0.968 n=20) geomean 16.60µ 17.11µ +3.11% name time/op time/op vs base BM_SendRecv 93.59n ± 2% 99.32n ± 1% +6.12% (p=0.000 n=20) BM_RecvSend 76.89n ± 1% 83.19n ± 1% +8.19% (p=0.000 n=20) BM_PingPong/100 704.2µ ± 1% 693.8µ ± 3% ~ (p=0.086 n=20) BM_PingPong/200 1.434m ± 3% 1.393m ± 4% ~ (p=0.201 n=20) BM_PingPong/300 2.158m ± 2% 2.120m ± 2% ~ (p=0.265 n=20) geomean 27.49µ 27.91µ +1.53% name INSTRUCTIONS/op INSTRUCTIONS/op vs base BM_SendRecv 1.053k ± 0% 1.229k ± 0% +16.71% (p=0.000 n=20) BM_RecvSend 833.2 ± 0% 1008.2 ± 0% +21.00% (p=0.000 n=20) BM_PingPong/100 539.0k ± 0% 576.2k ± 0% +6.90% (p=0.000 n=20) BM_PingPong/200 1.024M ± 0% 1.098M ± 0% +7.29% (p=0.000 n=20) BM_PingPong/300 1.507M ± 0% 1.621M ± 0% +7.55% (p=0.000 n=20) geomean 59.24k 66.20k +11.74% name CYCLES/op CYCLES/op vs base BM_SendRecv 328.7 ± 2% 348.4 ± 1% +6.00% (p=0.000 n=20) BM_RecvSend 269.9 ± 1% 292.0 ± 1% +8.21% (p=0.000 n=20) BM_PingPong/100 649.2k ± 1% 650.8k ± 1% ~ (p=0.841 n=20) BM_PingPong/200 1.279M ± 1% 1.281M ± 2% ~ (p=0.968 n=20) BM_PingPong/300 1.917M ± 1% 1.926M ± 1% ~ (p=0.369 n=20) geomean 42.65k 43.92k +2.97% name items/s items/s vs base BM_PingPong/100 323.8k ± 2% 320.8k ± 2% ~ (p=0.841 n=20) BM_PingPong/200 326.6k ± 3% 325.6k ± 2% ~ (p=0.799 n=20) BM_PingPong/300 322.7k ± 2% 321.7k ± 2% ~ (p=0.968 n=20) geomean 324.3k 322.7k -0.50% ``` PiperOrigin-RevId: 893694334

PiperOrigin-RevId: 893714548

Main goal is to not include Eigen when all we need is error codes. PiperOrigin-RevId: 893722480

PiperOrigin-RevId: 893732146

…tency. PiperOrigin-RevId: 893751639

PiperOrigin-RevId: 893764092

…o IFRT. PiperOrigin-RevId: 893765489

`xla::ifrt::Value::ByteSize()` is a new API that asks the IFRT runtime to compute the byte size of the IFRT value object (an array or an upcoming tuple). This API will provide the user with a fast and accurate way to calculate on-device sizes of IFRT value objects, and the runtime would be responsible for providing a robust implementation of this calculation. Since `xla::ifrt::Array` is a subclass of `xla::ifrt::Value`, `xla::ifrt::Array::ByteSize()` can be used without casting `xla::ifrt::ArrayRef` to `xla::ifrt::ValueRef`. The initial implementation of this method uses `Layout::ByteSize()` or `PjRtLayout::ByteSize()` to compute it on the fly. The implementation currently does not cache or precompute it. PiperOrigin-RevId: 893768289

Previously, JAX and StableHLO did not have asynchronous collectives. Thus, every JAX program lowered, via StableHLO, to an HLO program without asynchronous collectives. Recently, we added asynchronous collectives to JAX and StableHLO, but the XLA CPU backend doesn't support asynchronous collectives. If you try to run a JAX program with asynchronous collectives on CPU, it will crash. We can solve this in two ways. (1) We could do nothing and let the program crash. (2) We could replace the asynchronous collectives with synchronous collectives. This CL implements option 2. The XLA CPU backend now immediately replaces any asynchronous collectives with their synchronous counterparts. PiperOrigin-RevId: 893768474

Updates LLVM usage to match [7ccd92e5e6e5](llvm/llvm-project@7ccd92e5e6e5) PiperOrigin-RevId: 893768833

codeXsidd and others added 22 commits March 21, 2026 10:38

docs: Fix typo 'occured' to 'occurred' in metrics

63901ef

Trigger CLA check

5c28519

[XLA:GPU] Use XLA_VLOG_DEVICE where device id is useful in logs.

8be2ee2

PiperOrigin-RevId: 893502380

[XLA:GPU] Do not run large HLO fusions concurrently in command buffer…

e22249f

…s to avoid instruction cache thrashing. PiperOrigin-RevId: 893502849

Automated Code Change

0e5326d

PiperOrigin-RevId: 893503478

[mpmd] Use reserved_hbm_bytes attribute to reserve per-fragment memory

33264a6

PiperOrigin-RevId: 893542078

Merge pull request #112909 from codeXsidd:fix-typo-metrics

d6c9fa0

PiperOrigin-RevId: 893562678

Consolidate sanitizer build checks for tfrt_session_python_test.

2028a1b

PiperOrigin-RevId: 893591324

Reverts cd712c8

46b5a5c

PiperOrigin-RevId: 893592092

Original change is causing presubmit tests failure. Rolling back for …

45e89d3

…further investigation. Reverts 7fc14e3 PiperOrigin-RevId: 893594429

Rollback of PR #39854

2048f98

Reverts c6d844d PiperOrigin-RevId: 893602331

Make HloProgram::name() return a unique and stable name even if the…

4603402

… module does not have a name PiperOrigin-RevId: 893611161

Remove unneeded condition variable from coordination service.

147e4d9

PiperOrigin-RevId: 893613883

Reverts e6f36bd

c3f2bc1

PiperOrigin-RevId: 893615727

Populate device_memory_bytes_limit attribute while reconstructing Top…

cfc9b69

…ologyDescription from proto PiperOrigin-RevId: 893622566

pull Bot locked and limited conversation to collaborators Apr 2, 2026

pull Bot added the ⤵️ pull label Apr 2, 2026

ekayaaslan and others added 6 commits April 2, 2026 12:42

Drop deduplicating any calls on outliner.

6b2cc0c

By this change, the shardy outliner translates the named computations into separate calls leaving it as a flat call graph. PiperOrigin-RevId: 893645729

[XLA:CPU] Add Pi approximation to fusion microbenchmark.

ce5d9d1

PiperOrigin-RevId: 893678182

[IFRT IR] Add compile option to set strict memory reservations

b5979e4

PiperOrigin-RevId: 893686530

Run MakeArraysFromHostBuffer and CopyToHostBuffer tests with stri…

3794104

…des on all memory kinds PiperOrigin-RevId: 893691249

seantalts and others added 11 commits April 2, 2026 14:16

[XLA] Wrap loops containing only DCHECKs with ifndef NDEBUG.

f5df0e8

PiperOrigin-RevId: 893714548

[tsl] Clean up #includes in errors.h and status.h

f983c0f

Main goal is to not include Eigen when all we need is error codes. PiperOrigin-RevId: 893722480

Migrate multithreaded_compilation_test to PjRt runtime.

7a27b8a

PiperOrigin-RevId: 893732146

[XLA:CPU] Rename is_fusion_emitters to use_fusion_emitters for consis…

84f6f47

…tency. PiperOrigin-RevId: 893751639

Remove unused reset logic from coordination agent.

d69b6eb

PiperOrigin-RevId: 893764092

[IFRT IR] Do not apply identity IfrtArraType conversion in lowering t…

b20eda2

…o IFRT. PiperOrigin-RevId: 893765489

Integrate LLVM at llvm/llvm-project@7ccd92e5e6e5

6ec44a0

Updates LLVM usage to match [7ccd92e5e6e5](llvm/llvm-project@7ccd92e5e6e5) PiperOrigin-RevId: 893768833

pull Bot merged commit 6ec44a0 into GesuBackups:master Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from tensorflow:master#1688

[pull] master from tensorflow:master#1688
pull[bot] merged 39 commits into
GesuBackups:masterfrom
tensorflow:master

pull Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

pull Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull Bot commented Apr 2, 2026 •

edited

Loading